figure 23
T-REX: A 68-567 {\mu}s/token, 0.41-3.95 {\mu}J/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET
Moon, Seunghyun, Li, Mao, Chen, Gregory, Knag, Phil, Krishnamurthy, Ram, Seok, Mingoo
This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. Additionally, a new control flow mechanism, called dynamic batching, and a novel buffer architecture, termed a two-direction accessible register file, further reduce external memory access while improving hardware utilization.
2503.00322
Country:
- North America > United States > Oregon > Washington County > Hillsboro (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)